These are the replication files for the analysis in the paper "Training caretakers to clean community wells is a highly cost-effective way to reduce exposure to coliform bacteria."

Anonymized data are included in the subdirectory named /data. Five anonymized datasets are provided. Three correspond to observational and survey data completed for each community well in the study in, respectively, 2021 (baseline), January 2023 (endline 1), and May 2023 (endline 2). E. coli and total coliform test results have been merged with the well information data using barcodes to track samples. Two datasets correspond to interviews with well caretakers in January 2023 (endline 1) and May 2023 (endline 2). All identifiers including location are removed from all datasets. The corresponding survey instruments used to collect these data are included in the subdirectory /questionnaires. 

STATA code is provided in the subdirectory named /code.  Running analysis_master will replicate all figures in the main text (saved to /graphs), all supplementary tables (saved to /tables), and supplementary figures 3-5 (saved to /supplementary_graphs). Subfigures in STATA format are saved in /stata_graphs before being merged. Code to produce supplementary figures 1 and 2 is provided in the subdirectory /code/supplementary_analysis_code but the data required to replicate these figures is not included in this replication package. Please also note that the code to produce Supplementary Figure 1 takes around 4 days to run on a standard personal computer.  

Data to replicate maps is not provided, but a copy of figure 2 is included in the subdirectory /maps. The raw tubewell location data is withheld to preserve the anonymity of communities and individuals linked to the study.   








